Model Selection

Dense Feature Extraction

# Dense Feature Extraction

Vit So400m Patch16 Siglip Gap 384.v2 Webli

A ViT image encoder based on SigLIP 2, utilizing global average pooling, with the attention pooling head removed, suitable for image feature extraction tasks.

Image Classification

Vit So400m Patch16 Siglip Gap 256.v2 Webli

ViT image encoder based on SigLIP 2, using global average pooling, with attention pooling head removed, suitable for image feature extraction tasks.

Vit So400m Patch16 Siglip 384.v2 Webli

Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset

Vit So400m Patch16 Siglip 256.v2 Webli

SigLIP 2 ViT model, containing only the image encoder part for image feature extraction, trained on the WebLI dataset.

Vit So400m Patch14 Siglip Gap 378.v2 Webli

Vision Transformer model based on SigLIP 2 architecture, pre-trained on WebLI dataset, with attention pooling head removed and global average pooling applied

Image Classification

Vit So400m Patch14 Siglip Gap 224.v2 Webli

A ViT image encoder based on SigLIP 2, employing global average pooling with the attention pooling head removed, suitable for image feature extraction tasks.

Image Classification

Vit So400m Patch14 Siglip 378.v2 Webli

Vision Transformer model based on SigLIP 2, designed for image feature extraction, trained on the webli dataset

Vit So400m Patch14 Siglip 224.v2 Webli

A Vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction and pretrained on the webli dataset.

Image Classification

Vit Large Patch16 Siglip Gap 512.v2 Webli

A vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction, using Global Average Pooling (GAP) instead of attention pooling head

Image Classification

Vit Large Patch16 Siglip Gap 384.v2 Webli

A vision Transformer model based on the SigLIP 2 architecture, featuring a Global Average Pooling (GAP) variant that removes the attention pooling head, suitable for image feature extraction tasks.

Vit Large Patch16 Siglip 512.v2 Webli

ViT image encoder based on SigLIP 2, designed for timm, suitable for vision-language tasks

Image Classification

Vit Large Patch16 Siglip 384.v2 Webli

A vision Transformer model based on the SigLIP 2 architecture, designed for image feature extraction, pretrained on the webli dataset

Vit Large Patch16 Siglip 256.v2 Webli

Vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction, trained on the webli dataset

Image Classification

Vit Giantopt Patch16 Siglip Gap 384.v2 Webli

A ViT image encoder based on SigLIP 2, utilizing global average pooling and removing the attention pooling head, suitable for image feature extraction tasks.

Image Classification

Vit Giantopt Patch16 Siglip Gap 256.v2 Webli

SigLIP 2 ViT image encoder, using global average pooling, with attention pooling head removed, designed specifically for timm

Image Classification

Vit Giantopt Patch16 Siglip 384.v2 Webli

ViT image encoder based on SigLIP 2, designed for timm, suitable for vision-language tasks

Image Classification

Vit Giantopt Patch16 Siglip 256.v2 Webli

Vision Transformer model based on SigLIP 2 technology, focused on image feature extraction

Vit Base Patch32 Siglip Gap 256.v2 Webli

A vision Transformer model based on SigLIP 2, using Global Average Pooling (GAP) instead of attention pooling head for image encoding

Vit Base Patch32 Siglip 256.v2 Webli

Vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction

Vit Base Patch16 Siglip Gap 512.v2 Webli

A ViT image encoder based on SigLIP 2, using global average pooling with the attention pooling head removed, suitable for image feature extraction tasks.

Image Classification

Vit Base Patch16 Siglip Gap 384.v2 Webli

ViT image encoder based on SigLIP 2, using Global Average Pooling (GAP) instead of attention pooling head, suitable for image feature extraction tasks.

Image Classification

Vit Base Patch16 Siglip Gap 256.v2 Webli

A ViT image encoder based on SigLIP 2, employing global average pooling with the attention pooling head removed, suitable for image feature extraction.

Multimodal Fusion

Vit Base Patch16 Siglip Gap 224.v2 Webli

Vision Transformer model based on SigLIP 2, utilizing global average pooling for image features

Image Classification

Vit Base Patch16 Siglip 512.v2 Webli

Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset

Vit Base Patch16 Siglip 384.v2 Webli

Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset

Vit Base Patch16 Siglip 256.v2 Webli

A ViT image encoder based on SigLIP 2 for extracting image features, supporting multilingual vision-language tasks.

Vit Base Patch16 Siglip 224.v2 Webli

ViT model based on SigLIP 2, focused on image feature extraction, trained on the webli dataset

Vit So400m Patch16 Siglip Gap 512.v2 Webli

A ViT image encoder based on SigLIP 2, utilizing global average pooling, suitable for vision-language tasks.

Vit SO400M 14 SigLIP2

A SigLIP 2 vision-language model trained on the WebLI dataset, suitable for zero-shot image classification tasks.

Vit L 16 SigLIP2 384

A SigLIP 2 vision-language model trained on the WebLI dataset, suitable for zero-shot image classification tasks.

Vit B 16 SigLIP2

A SigLIP 2 vision-language model trained on the WebLI dataset, suitable for zero-shot image classification tasks.

Siglip2 So400m Patch16 Naflex

SigLIP 2 is an improved model based on the SigLIP pre-training objective, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Base Patch16 Naflex

SigLIP 2 is a multilingual vision-language encoder that integrates SigLIP's pretraining objectives and introduces new training schemes, enhancing semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 So400m Patch16 512

SigLIP 2 is a vision-language model based on SigLIP, enhanced with improved semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 So400m Patch16 384

SigLIP 2 is an improved model based on the SigLIP pre-training objective, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 So400m Patch16 256

SigLIP 2 is an improved model based on SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Giant Opt Patch16 256

SigLIP 2 is an advanced vision-language model that integrates multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Large Patch16 384

SigLIP 2 is an improved multilingual vision-language encoder based on SigLIP, enhancing semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Large Patch16 256

SigLIP 2 is an improved vision-language model based on SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Base Patch16 512

SigLIP 2 is a vision-language model that integrates multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase